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ABSTRACT 

Occasionally  Che  prediction  equation  obtained  by  conventional 
regression  techniques  is  an  unsatisfactory  predictor  because  of  its 
behavior  over  segments  of  the  range  of  the  independent  variable(s) . 

For  such  situations,  a  procedure  is  illustrated  which  has  been 
found  to  yield  a  "better  fit"  than  that  obtained  by  conventional 
regression  analysis.  The  procedure  consists  of  segmenting  the 
levels  of  the  independent  variable(s)  into  blocks  and  separately 
fitting  each  block.  The  separate  fits,  however,  are  obtained 
simultaneously  and  the  end  result  is  a  single  prediction  equation. 
Numerical  examples  are  given  typifying  regression  analysis  problems 
encountered  in  which  the  proposed  procedure  yields  a  "better  fit". 

In  each  example ,  the  proposed  procedure  of  blocking  in  regression 
analysis  is  compared  with  conventional  regression  analysis.  Extensions 
in  the  application  of  blocking  in  prediction  problems  and  in 
comparative  problems  are  briefly  discussed. 
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X.  INTRODUCTION 


The  principle  of  blocking  in  designed  experiments  conducted  for 
comparative  analysis  purposes,  namely  the  analysis  of  variance,  is  well 
established.  However,  the  principle  of  blocking  in  experiments  conducted 
for  prediction  purposes  does  not  appear  to  be  fully  utilized.  Indeed, 
a  physical  situation  dictates  the  same  restrictions  upon  experimentation 
conducted  for  prediction  purposes  as  for  comparative  analysis  purposes. 

That  is,  just  as  the  analysis  of  variance  is  determined  by  the  design  of 
the  experiment  so  should  regression  analysis  be  determined  by  the  design 
of  the  experiment.  In  addition  to  the  design  of  an  experiment,  another 
source  of  motivation  for  blocking  in  regression  analysis  is  the  demand 
from  the  experimenter  for  a  "better  fit".  Often  an  experimenter's  sole 
objective  is  to  find  a  mathematical  expression  that  "sufficiently  fits" 
his  data.  That  is,  he  is  not  interested  in  testing  hypotheses  concerning 
the  parameters  of  some  hypothesized  model;  instead,  he  is  interested  in 
the  behavior  of  a  mathematical  function  over  a  given  range  of  the  independent 
variable(s) .  This  latter  source  of  motivation  initiated  this  report, 
and  its  objective  is  to  illustrate  the  application  of  blocking  by  employing 
durany  variables  in  regression  analysis  to  achieve  a  better  fit  than  that 
obtained  by  conventional  regression  analysis. 

The  use  of  a  dummy  variable  in  regression  analysis  is  not  new.  Many 
authors  attach  a  dummy  variable,  which  always  takes  the  value  of  unity,  to 
the  constant  (0,  )  for  notational  convenience,  especially  when  using 
matrix  notation.  Therefore,  no  pretension  is  made  to  the  originality  of 
using  dummy  variables  in  regression  analysis;  instead,  an  attempt  is  made 
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to  extend  the  use  of  dummy  variables  in  regression  analysis.  Likewise, 
the  principle  of  blocking  in  regression  analysis  is  not  new;  however,  it 
has  received  surprisingly  little  attention  in  recent  literature. 

Suits  (1957)  uses  dunny  variables  in  regression  analysis  of  independent 
variables  which  are  partitioned  (blocked)  into  mutually  exclusive  qualitative 
classifications.  Klopfenstein  (1964)  stresses  the  utility  of  segmenting 
data  in  his  discussion  of  the  solution  of  the  least  squares  approximation 
problem  subject  to  a  class  of  constraint  conditions.  Draper  and  Smith 
(1966)  fit  two  linear  trends  to  data  which  has  been  segmented  into  two 
blocks  and  illustrate  the  two  cases  of  known  and  unknown  point  of  intersection 
of  the  two  trends.  Smillie  (1966)  uses  dummy  variables  to  introduce 
qualitative  variables  into  a  regression  function  and  gives  a  numerical 
example  having  a  qualitative  factor  with  two  levels.  The  author  of  this 
report  feels  that  a  need  exists  for  a  more  thorough  exemplification  of 
the  utility  of  blocking  in  regression  analysis  than  that  illustrated  in 
the  current  literature. 

II.  BACKGROUND 

Knowledge  of  conventional  regression  analysis  is  assumed;  therefore, 
neither  the  historical  background  nor  the  theory  of  regression  analysis 
is  discussed  in  detail.  Instead,  only  definitions  and/or  explanation; 
are  given  of  the  terminology  and  notation  used  later  in  the  report. 

In  prediction  problems  concerning  regression  analysis  involving 
a  single  dependent  or  response  variable  (y)  and  N  independent  variables 


(Xi ,Xa , • •  ,XN) ,  the  response  variable  is  assumed  to  be  normally 
distributed  about  the  "true"  response  function  (T|)  with  common  variance 
a3,  where 

T)  =  «<xi,xa,  •••.**)  (1) 

is  linear  in  the  parameters. 

The  objective  is  to  determine  a  prediction  equation  which  "fits" 
the  given  data  with  a  prescribed  degree  of  precision.  This  is  accomplished 
by  using  a  postulated  model, 

y  =  +  e,  (2) 

to  estimate  T|,  where  e  is  a  random  error.  Assuming  the  general  multiple 
linear  regression  model  to  be  the  postulated  model,  equation  (2)  is  of 
the  form 

y  =  S0  +  01xl  +  0aXa  +  •••  +  SNxN  +  e,  (3) 

where 

y  *  the  dependent  variable 

Xy  «  the  Independent  variable;  v  a  1,2,  •••,N 
S0  «  a  constant 

Sy  «  the  "true"  partial  regression  coefficient  of  x«  ;  v  a  1,2,  •••,*! 
era  random  error . 

Some  of  the  independent  variables  may  not  be  actually  observed  variables; 
for  example.  x=  may  equal  x?,  x,  may  equal  x^,  and  so  forth.  In 
particular,  in  the  case  of  a  single  independent  variable  (x),  the  postulated 


model  may  be  an  order  polynomial  and  equation  (3)  becomes 

y  =  60  +  Pi*  +  0gxa  +  •••  +  Sn*n  +  e*  (4) 

Applying  the  least  squares  principle  by  minimizing  the  sum  of 
squares  of  the  deviations  between  the  observed  yt  values  and  the  Y, 
predictions  yields  unbiased  estimates  of  the  parameters  of  equation  (3), 
where 

Y,  *  b0  +  b^n  +  bsXs,  +  •••  +  b*xN1;  i  »  1,2,  -  .n,  (5) 

and  where  n  is  the  number  of  observed  dependent  variable  values. 

Concerning  the  distribution  of  the  random  errors  (et),  the  usual 
assumptions  of  normally  and  independently  distributed  random  errors  with 
mean  zero  and  variance  o?  are  assumed  throughout  the  following  discussion 
without  further  comment.  For  a  complete  discussion  of  the  assumptions, 
see  for  example,  Anderson  and  Bancroft  (1952),  Held  (1952),  Bennett  and 
Franklin  (1954),  or  Johnson  and  Leone,  Volume  I  (1964). 

Criteria  for  judging  the  "goodness  of  fit"  of  a  prediction  equation 
are  (or  certainly  should  be)  determined  by  the  intended  use  of  the  prediction 
equation.  Some  of  the  more  common  criteria  are  based  on  the  magnitude  of 

the  Coefficient  of  Multiple  Determination  (R*) ,  where  R  is  the  Multiple 

Correlation  Coefficient;  significance  test  of  the  "Lack  of  Fit";  and  the 
magnitude  of  the  residuals,  e,  «  y  -Y.  or  7,  ■  y, -Y, .  These  criteria 
are  used  directly  or  indirectly  in  the  NUMERICAL  EXAMPLES  Section  where 
comparisons  are  made  of  blocking  in  regression  analysis  with  conventional 
regression  analysis. 
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III.  NUMERICAL  EXAMPLES 

I .  One  Curvilinear  Trend  and  One  Linear  Trend 

Consider  an  experiment  in  which  a  single  response  was  observed  from 
each  of  17  fixed  levels  of  a  given  independent  variable.  The  objective 
of  the  experiment  was  to  determine  a  simple  prediction  equation  (one  containing 
as  few  terms  as  possible)  for  the  response  variable.  For  acceptance  of  a 
prediction  equation,  the  residuals  were  to  be  within  a  prescribed  tolerance, 
i.e.,  |y1-Y,|<  6;  i  =  1,2, •••,17.  The  "true"  response  function  was  known  to 
be  monotonically  increasing  throughout  the  range  of  the  independent 
variab1".  Additionally,  the  rate  of  change  of  the  response  function  was 
increasing  over  a  portion  of  the  range  of  the  independent  variable,  while 
the  rate  of  change  was  nearly  constant  for  the  remainder  of  the  range  of 
the  independent  variable.  The  data  was  as  follows. 


Least  squares  polynomials  of  increasing  order  were  determined  in  the 
conventional  manner.  Prediction  equations  of  the  8th  order  and  less 
failed  to  satisfy  the  specified  tolerance.  In  addition,  a  prediction 
equation  having  more  than  live  or  six  terms  would  have  been  impractical 
for  the  intended  use  of  the  prediction  equation. 
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An  examination  of  a  plot  of  the  data  suggested  that  the  transition 

* 

from  increasing  to  constant  rate  of  change  was  between  levels  6  and  8  of 
the  independent  variable.  Therefore,  the  independent  variable  was  segmented 
into  two  blocks,  the  first  being  from  1  through  7  and  the  second  from  8 
through  17.  Constant,  linear,  and  quadratic  terms  were  fitted  for  the  first 
block,  and  a  linear  term  was  fitted  for  the  second  block. 

Before  discussing  the  blocking  procedure,  the  construction  of  the 
design  matrix  is  briefly  discussed.  In  the  design  matrix  of  TABLE  1,  xx 
and  xf  refer  to  the  first  block,  Xg  refers  to  the  second  block,  and  x3 
represents  the  difference  between  the  blocks.  In  the  first  block  the 
elements  of  the  xx -column  take  the  values  of  the  original  independent  variable, 
and  in  the  second  block  the  elements  of  the  xx -column  take  the  first  value 
of  the  original  independent  variable  in  the  second  block.  The  elements 
of  the  xf -column  are  the  squares  of  the  elements  in  the  xx -column.  In 
the  first  block,  the  elements  of  the  xs -column  take  a  zero;  in  the  second 
block,  they  take  the  value  of  the  original  independent  variable  minus  the 
first  value  of  the  original  independent  variable  in  the  second  block.  The 
elements  of  the  x3 -column  are  assigned  a  zero  in  the  first  block  and 
assigned  a  one  in  the  second  block. 

Note  that  the  design  matrix  explained  above  and  illustrated  in 
TABLE  1  is  not  the  only  design  matrix  that  could  have  been  used.  That 
is,  the  analyst  is  permitted  flexibility  in  the  construction  of  the  design 
matrix.  The  elements  of  the  columns  referring  to  the  blocks  could  have 
represented  transformed  or  scaled  values  of  the  original  independent  variable. 
Similarly,  the  zero's  and  one's  in  the  x3 -column  could  have  been  assigned 
differently.  Naturally,  a  change  in  the  construction  of  the  design  matrix 
changes  the  interpretation  of  the  estimated  regression  coefficients. 
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TABLE  I 


DESIGN  MATRIX  AND  RESPONSE  DATA 


Indep.  Var. 
Index  (i) 

X 

*1 

*8 

x3 

y 

1 

1 

1 

1 

0 

0 

0 

2 

2 

B 

2 

4 

0 

0 

l 

3 

3 

L 

3 

9 

0 

0 

6 

0 

4 

4 

C 

4 

16 

0 

0 

10 

K 

5 

5 

I 

5 

25 

0 

0 

18 

6 

6 

6 

36 

0 

0 

28 

7 

7 

7 

49 

0 

0 

37 

8 

8 

8 

64 

0 

1 

40 

9 

9 

8 

64 

1 

1 

43 

10 

10 

8 

64 

2 

1 

43 

11 

11 

B 

L 

8 

64 

3 

1 

46 

12 

12 

0 

C 

8 

64 

4 

1 

47 

13 

13 

K 

8 

64 

5 

1 

51 

14 

14 

II 

8 

64 

6 

1 

51 

13 

15 

8 

64 

7 

1 

55 

16 

16 

8 

64 

8 

1 

56 

17 

17 

8 

64 

9 

1 

59 
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Four  degrees  of  freedom  were  used  for  regression  when  blocking. 
Therefore,  the  4^  order  prediction  equation,  Y(C) ,  obtained  in  the 
conventional  manner  is  compared  with  the  prediction  equation,  Y(B) , 
obtained  by  blocking.  The  two  prediction  equations  are: 

Y(C)  «  -  0.2115  -  3.0124x  +  2.3254xs  -  0.2169x3  +  0.0061x4 
Y(B)  =  -  0.5714  -  0.6310X!  +  0.8690x?  +  2.0667xg  -  10.2000x3 

The  MS (Lack  of  Fit)  has  been  reduced  by  one -sixth  as  can  be  seen 
in  the  following  ANOVA  TABLE. 


ANOVA  TABLE 

CONVENTIONAL 

BLOCKING 

Source 

DF 

SS 

MS 

SS 

MS 

Regression 

4 

6472.249 

1618.062 

6525 .430 

1631.358 

Lack  of  Fit 

12 

62.810 

5.234 

9.629 

0.802 

Total 

16 

6535.059 

6535.059 

Figure  1  shows  a  plot  of  the  data,  the  4th  order  Y(C),  and  Y(B) . 
The  dashed  portion  of  Y(B)  between  the  two  blocks  illustrates  the 
interpretation  of  the  regression  coefficient  of  x3 .  The  estimated 
regression  coefficient  (-10.2000)  of  x3  is  the  vertical  shift  between 
the  two  trends  of  Y(B)  at  the  first  level  of  the  second  block.  That 
is,  Y(B)  given  x^,  Xj;=0,  x3  =  l  minus  Y(B)  given  xa=0,  x3=0 

equals  -10.2000. 
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Figure 


A  comparison  of  the  residuals  of  Y(C)  with  Che  residuals  of 
Y(B)  shows  that  the  ej(C)  range  from  -2.86  to  4.21  while  the  e,(B) 


range  from  -1.20  to  1.13,  where  e,(C)  =  yl-Yl(C)  and  et(B)  =  yj-Y^B). 
Figure  2  shows  a  comparison  of  the  residuals. 
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2 .  Two  Curvilinear  Trends 


Consider  another  experiment  in  which  35  measured  responses 
were  obtained  from  16  fixed  levels  of  an  independent  variable.  Again, 
the  objective  of  the  experiment  was  to  obtain  a  simple  prediction 
equation  for  the  response  variable.  In  addition,  the  prediction 
equation  must  possess  certain  characteristics,  the  most  important 
being  that  it  yield  non-negative  predictions  for  dependent  variable 
values  within  the  range  of  the  experiment.  Also,  the  true  response 
function  was  known  to  be  unimodal,  and  was  known  to  be  monotonically 
decreasing  for  increasing  independent  variable  values  to  the  right 
of  the  stationary  point.  The  data  was  as  follows. 


Independent  Dependent 

Variable  (x) _ Variable  (v) 


1.0 

0.5 

1.0 

1.5 

1.5 

6.0 

8.0 

2.0 

10.5 

11.0 

11.5 

2.5 

12.0 

13.0 

14.0 

3.0 

14.0 

15.5 

3.5 

15.0 

16.0 

4.0 

15.0 

16.0 

17.0 

4.5 

15.0 

15.0 

16.0 

5.0 

13.5 

15.5 

5.5 

10.5 

11.0 

11. 0 

6.0 

6.0 

8.0 

7.0 

4.0 

4.5 

8.0 

2.5 

3.0 

10.0 

1.5 

15.0 

0.7 

20.0 

0.1 
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Least  squares  fits  were  performed  in  the  conventional  manner, 
obtaining  polynomial  expressions  in  the  independent  veriable.  Prediction 
equations  of  the  9t*1  order  and  less  were  found  to  be  unsatisfactory 
predictors.  All  curvilinear  prediction  equations  yielded  some  negative 
predictions  corresponding  to  x-values  within  the  range  of  the  experiment. 

An  examination  of  a  plot  of  the  data  shoved  that  in  the  range  of 

l 

l  to  6  of  the  independent  variable,  the  response  trend  was  curvilinear 
and  concave  downward.  But  in  the  range  of  6  to  20,  the  response  trend 
was  curvilinear,  concave  upward,  and  asymptotic  to  the  x-axis  as  x 
increased.  That  is,  the  first  trend  appears  as  a  portion  of  a  parabola 
opening  downward,  while  the  second  trend  appears  as  a  portion  of  a 
parabola  opening  to  the  right.  Therefore,  the  independent  variable  was 
segmented  into  two  blocks.  For  the  first  block  linear  and  quadratic 
terms  were  included,  and  for  the  second  block  linear  and  square  root 
terms  were  included.  The  design  matrix  is  shown  in  TABLE  II. 

Because  five  degrees  of  freedom  were  used  for  regression  when 
blocking,  the  5th  order  prediction  equation,  Y(C),  obtained  in  the 
conventional  manner  is  compared  with  Y(B)  obtained  by  blocking: 

Y(C)  -  -  20.2869  +  26.9623x  -  6.8518xa  +  0.7012x3  -  0.0319x4  +  0.0005x6 


Y(B)  *  -  10.8405  *  I4.2910xk  -  1.8808x?  +  0.1773xb  -  1.7921/xa  +  7.2348x3 


TABLE  II 


DESIGN  MATRIX  AND  RESPONSE  DATA 


Indep.  Var. 
Index  (1) 

X 

*1 

*8 

/*a 

*9 

y 

1 

1.0 

1.0 

1.00 

0 

0 

0 

0.5 

1.0 

1.5 

2 

1.5 

1.5 

2.25 

0 

0 

0 

6.0 

3 

2.0 

2.0 

4.00 

0 

0 

0 

10.5 

11.0 

11.5 

4 

2.5 

B 

2.5 

6.25 

0 

0 

0 

12.0 

13.0 

14.0 

5 

3.0 

L 

3.0 

9.00 

0 

0 

0 

14.0 

15.5 

0 

6 

3.5 

C 

3.5 

12.25 

0 

0 

0 

15.0 

16.0 

K 

7 

4.0 

I 

4.0 

16.00 

0 

0 

0 

15.0 

16.0 

17.0 

8 

4.5 

4.5 

20.25 

0 

0 

0 

15.0 

15.0 

16.0 

9 

5.0 

5.0 

25.00 

0 

0 

0 

13.5 

15.5 

10 

5.5 

5.5 

30.25 

0 

0 

0 

10.5 

11.0 

11.0 

11 

6.0 

6.0 

36.00 

0 

0 

0 

6.0 

8.0 

12 

7.0 

B 

7.0 

49.00 

0 

0 

1 

4.0 

4.5 

13 

8.0 

L 

7.0 

49.00 

1 

1 

1 

2.5 

3.0 

0 

14 

10.0 

C 

7.0 

49.00 

3 

1.732 

1 

1.5 

K 

15 

15.0 

II 

7.0 

49.00 

8 

2.828 

1 

0.7 

16 

20.0 

7.0 

49.00 

13 

3.606 

1 

0.1 
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A  comparison  of  the  "Lack  of  Fit"  of  Y(C)  and  Y(B)  can  be  seen 
from  the  ANOVA  TABLE  below.  The  MS[Lack  of  Fit  of  Y(B)]  Is  approximately 
one-fifth  as  large  as  the  MS  [Lack  of  Fit  of  Y(C)].  If  a  test  were 
performed,  the  MS[Lack  of  Fit  of  Y(C)]  would  be  found  to  be  significant 
at  the  0.01-level  of  significance,  while  the  MS[Lack  of  Fit  of  Y(B)]  is 
obviously  not  significant. 


ANOVA  TABLE 

CONVENTIONAL 

BLOCKING 

Source 

DF 

SS 

MS 

SS 

MS 

Regression 

5 

1037.961 

207 .592 

1065.681 

213.136 

Lack  of  Fit 

10 

34.407 

3.441 

6.687 

0.669 

Within 

19 

13.708 

0.721 

13.708 

0.721 

Total 

34 

1086.076 

1086.076 

Figure  3  shows  a  plot  of  the  data  and  the  two  prediction  equations. 
Mote  that  Y(C)  yields  negative  values  at  x  »  10,11,12,17,18,19.  This, 
in  addition  to  being  a  "poor  fit"  at  x  ■  10,  illustrates  the  danger  of 
interpolation  when  the  levels  of  the  independent  variable  are  unequally 
weighted  and/or  nonequidistent.  Again,  the  estimated  regression 
coefficient  (7.2348)  of  x3  is  the  vertical  shift  between  the  two  trends 
of  Y(B)  at  the  first  level  of  the  second  block.  Figure  3  also  contains 
a  plot  of  the  ¥,  ■  y,  -Y;  differences,  i.e.,  W, (C)  ■  y,  -Y,(C)  and 
V,(B)  ■  y. -Y, (B)  .  These  differences  along  with  their  corresponding 
predicted  values  ere  tabulated  in  TABLE  III  which  shows  the  range  of 
9.  (C)  to  be  -2.08  to  1.89,  while  the  range  of  Vt(B)  la  -0.64  to  0.91. 


14 


s  * 

2  * 

i*. 

H 

«  >o 

p-i 


3  * 

o  «*» 

o*  <M 

CO  *4 

l»  o 

■ 

««  iT 

*>  «r> 

>*  <• 

n  « 

N  N 


*  • 
M  lT 


Figure 


3.  Three  Linear  Trends 


This  numerical  example  illustrates  an  extension  of  the  blocking 
principle  to  three  blocks.  The  hypothetical  example  is  for  demonstration 
of  the  procedure  instead  of  comparison  of  blocking  vith  conventional 
regression  analysis.  Therefore,  results  are  presented,  and  the  comparison 


to  the 

reader . 

The  data 

is  as 

follows . 

Independent  Variable  (x) , 

,  Dependent  Variable  (y) 

X 

y 

X 

y 

X 

V 

1 

3.5 

« 

8 

7.0 

15 

4.5 

2 

4.5 

9 

7.0 

16 

5.5 

3 

4.5 

10 

7.5 

17 

6.0 

4 

5.0 

11 

7.5 

18 

7.0 

5 

5.5 

12 

7.5 

19 

7.5 

6 

6.0 

13 

8.5 

20 

8.5 

7 

6.0 

14 

3.5 

2' 

9.5 

As  shown  in  TABLE  IV,  the  data  is  divided  into  three  blocks 
having  seven,  six,  and  eight  levels,  respectively. 
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TABLE  IV 


DESIGN  MATRIX  AND  RESPONSE  DATA 

Indep.  Var. 


Index  (1) 

X 

*1 

*2 

*3 

X* 

X6 

y 

1 

1 

1 

0 

0 

0 

0 

3.5 

2 

2 

2 

0 

0 

0 

0 

4.5 

B 

3 

3 

L 

3 

0 

0 

0 

0 

4.5 

0 

4 

4 

C 

4 

0 

0 

0 

0 

5.0 

K 

5 

5 

5 

0 

0 

0 

0 

5.5 

I 

6 

6 

6 

0 

0 

0 

0 

6.0 

7 

7 

7 

0 

0 

0 

0 

6.0 

8 

8 

B 

8 

0 

0 

1 

0 

7.0 

9 

9 

L 

8 

1 

0 

1 

0 

7.0 

0 

10 

10 

C 

8 

2 

0 

1 

0 

7.5 

K 

11 

11 

II 

8 

3 

0 

1 

0 

7.5 

12 

12 

8 

4 

0 

1 

0 

7.5 

13 

13 

8 

5 

0 

1 

0 

8.5 

14 

14 

8 

6 

0 

1 

1 

3.5 

15 

15 

8 

6 

1 

1 

1 

4.5 

16 

16 

B 

L 

8 

6 

2 

1 

1 

5.5 

17 

17 

0 

C 

8 

6 

3 

1 

1 

6.0 

18 

18 

K 

8 

6 

4 

1 

1 

7.0 

19 

19 

III 

8 

6 

5 

1 

1 

7.5 

20 

20 

8 

6 

6 

1 

1 

8.5 

21 

21 

8 

6 

7 

1 

1 

9.5 
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The  two  prediction  equations  are': 

Y(C)  -  5.0589  -  1.5715x  +  0.6116x2  -  0.0697*3  +  0.0031x4  +  0.00005x6 
Y(B)  -  3.3571  +  0.4107xj  +  0.2571xg  +  0.8214x3  +  0.2143x*  -  4.7750xe 

The  asiount  of  variation  "explained"  by  each  prediction  equation  is 
evidenced  in  the  following  ANOVA  TABLE. 


ANOVA  TABLE 

CONVENTIONAL  BLOCKING 


Source 

DF 

SS 

MS 

SS 

MS 

Regression 

5 

40.124 

8.025 

55.005 

11.001 

Lack  of  Fit 

15 

15.662 

1.044 

0.781 

0.052 

Total 

20 

55.786 

55.786 

Figure  4  shows  a  plot  of  the  data  and  the  two  prediction  equations. 
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Figure 


IV.  EXTENSION  OF  APPLICATION 
1 .  Prediction  Problems 

The  extension  to  K  blocks  is  a  straightforward  generalization 
of  the  illustrations  in  Section  III.  An  independent  variable  (x)  having 
N  levels  may  be  segmented  into  K  blocks  as  shown  in  IABLE  V.  The 
number  of  levels  of  the  independent  variable  in  the  j**1  block  is  Nj , 

K 

where  £  Nj  =  N .  Considering  only  linear  terms  in  each  block,  the 

J-l 

model  is 

K  2K-1 

y  -  3o  +  2  +  £  Pj'Xji  +  e. 

j=l  j 1 =K+ 1 

The  estimates,  bj ;  j  *  1,2,  of  the  parameters  of  equations  (6) 

are  the  K  slopes  of  the  prediction  equation;  and  the  estimates, 
b3i  ;  j'  =  K+l.K+2, • • *,2K-1,  are  the  (K-l)  vertical  shifts  between  the 
K  blocks.  Naturally  if  desired,  higher  order  terms  of  the  type, 

0j  Xj  J  ;  j  =  1,2,-..,K;  Kj  =  1,2, • ♦ • ,Nj -1,  maybe  included  in  the 

J 

model  of  equation  (6).  Further,  the  author  sees  no  obvious  complication 
in  generalizing  the  above  to  multiple  independent  variables.  The 
generalization  appears  to  be  an  extension  of  the  multiple  regression 
approach  to  the  analysis  of  variance  illustrated  by  Brownlee  (1960) 
or  Draper  and  Smith  (1966). 


(6) 
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2.  Comparative  Problems 

In  addition  to  the  application  of  blocking  in  prediction  problems, 
the  procedure  has  application  in  the  analysis  of  variance  of  both  crossed 
and  nested  classifications.  As  an  example  of  the  application  in  crossed 
classifications,  consider  a  simple  2X2X2  classification.  The  ANOVA 
model  may  be  written  as 

y  =  p,  +  aa  +  bg+CY  +  abag  +  acaY  +  bcpY  +  abcapY  +  *•  (7) 

The  corresponding  REGRESSION  model  may  be  written  as 

7 

y  =  30  +  Z  3yXv  +  e.  (8) 

v=l 

Applying  regression  analysis  by  using  the  design  matrix  of  ZABLE  VI 
yields  the  analysis  of  variance  for  the  three  factor  crossed  classification. 


ZABLE  VI 

DESIGN  MATRIX  FOR  A  2X2X2  CROSSED  CLASSIFICATION 

X!  xa  Xs  x*»  Xe»  Xe*  **» 

*1*8  *1*9  **9  *1*8*9 


1  l  1 

1  1  2 

12  1 
1  2  2 

2  1  1 

2  1  2 

2  2  1 

2  2  2 


1  1  1 

12  2 
2  12 

2  2  A 

2  2  1 

2  4  2 

4  2  2 

4  A  4 


1 

2 

2 

4 

2 

4 

4 

8 


Note:  TABLE  VI  is  an  illustration  of  blocking  applied  to  three  independent 
variables  (x,  is  segmented  into  two  blocks,  Xg  is  segmented  into  two  blocks 
within  each  block  of  xt ,  and  x,  is  segmented  into  two  blocks  within  each 
block  of  xa) . 

The  correspondence  of  the  analysis  of  variance  for  the  models  of 
equations  (7)  and  (8)  is  illustrated  in  the  following  table. 
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AMOVA  MODEL 
SOURCE 

AMOVA  SOURCE  COMPARISON 

REGRESSION  MODEL 

SOURCE 

DF 

A 

Due  to  bxt  bo 

1 

B 

Due  to  bgl  bo ,  bx 

1 

C 

Due  to  baj  b0 ,  bx ,  ba 

1 

AB 

Due  to  b*|  bo  ,  bx ,  ba  ,  ba 

1 

AC 

Due  to  bsl  ho »  bx  ,  bg ,  b3 ,  b4 

1 

BC 

Due  to  bg|b0,  bx ,  ba ,  ba,  b4,  bg 

1 

ABC 

Due  to  b^|  b0 ,  bx ,  ba ,  b3,  b4,  bg,  bg 

1 

For  an  illustration  of  the  application  of  blocking  in  nested 
classifications!  consider  a  two  factor  experiment  in  which  a  three 
level  quantitative  factor  is  nested  within  each  of  the  three  levels 
of  a  qualitative  factor.  The  data  is  displayed  in  TABLE  VII. 

TABLE  VII 

DATA  TABLE  FOR  A  NESTED  CLASSIFICATION 


Factor  A 


1 

2 

_ 3 _ 

Factor  B  within  A 

2 

3 

4  5  6 

7 

8 

__2 

1 

3 

4 

4  5  7 

5 

6 

6 

2 

4 

5 

5  6  8 

6 

7 

7 

The  AMOVA  model  may  be  written  as 


y  -  u  +  aa  +  bg(a)  +  a.  (9) 

Factor  A  has  (A-l)  «  2  degrees  of  freedom;  factor  B(A)  has  (B*l)A  -  6 
degrees  of  freedom.  Applying  the  usual  AMOVA  computational 
procedures  to  the  data  in  TABLE  VII  gives  the  following  AMOVA  TABLE. 

2h 


Source 

ANOVA 

DF 

TABLE 

ss 

MS 

A 

2 

32.444 

16.222 

B(A) 

6 

20.000 

3.333 

Within 

9 

4.500 

0.500 

Total 

17 

56.944 

Before  applying  the  proposed  blocking  procedure,  Che  regression 
model  corresponding  to  the  ANOVA  model  of  equation  (9)  is  briefly 
discussed.  The  terms  within  the  regression  model,  and  consequently 
the  columns  of  the  design  matrix,  are  arranged  differently  from  the 
arrangement  used  in  the  preceding  sections  of  this  paper.  This 
rearrangement  of  terms  within  the  regression  model  is  merely  for 
convenience  so  that  the  terms  referring  to  factor  A  precede  the 
terms  referring  to  factor  B  within  A  (as  they  appear  in  the  ANOVA 
model  of  equation  (9)) .  That  is,  the  set  of  (K-l)  terms  represented 
by  the  third  term  of  equation  (6)  appears  isnedlately  after  the  constant 
B0*  Consequently,  the  BEGBESSION  model  is  written  as 

y  -  00  *  ^01  *1  +  0a*8^  *  ®31*S  +  033*1  +  0*1*4  +  0*8*4  +  081*6  +  068*1  +  ••  (1®) 

Factor  A  Factor  B  within  A 

The  design  matrix  corresponding  to  equation  (10)  is  shown  in  TABLE  VIII, 

The  ANOVA  resulting  from  application  of  the  proposed  blocking 
procedure  is  given  in  the  RKGKBSSIDN  ANOVA  TABLE.  Testing  the  three 
parameters  (0,a,  B«a,  0^)  of  tbs  quadratic  terms  in  equation  (10)  as 
"Lack  of  Fit",  we  conclude  that  the  departure  from  linearity  is  not  slgnlf* 
leant.  That  is,  a  prediction  aquation  containing  only  linear  tarns  of  the 
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TABLE  VIII 


:  J  l 

n 

H 

H 

m 

I't” 

H 

4 

7 

B  0 

■ 

i 

i 

0 

0 

9 

0 

1  2 

L 

0  1  0 

prjM 

2 

4 

■J 

0 

m 

3  4 

C 

K  0 

■ 

3 

9 

H 

0 

a 

0 

4  5 

B  1 

0 

4 

16 

0 

0 

9 

0 

4  5 

L 

0  II  1 

umm 

4 

16 

i 

1 

m 

5  6 

C 

K  1 

m 

4 

16 

2 

4 

a 

0 

7  8 

B  1 

i 

4 

16 

3 

9 

0 

5  6 

L 

0  III  1 

i 

4 

16 

3 

9 

i 

1 

6  7 

C 

K  1 

i 

4 

16 

3 

9 

2 

4 

6  7 

quantltatlva  factor  "adaquataly  fits"  tha  data 

in  TABLE  VII.  The 

rasultlng  linear  prediction  aquation  it 

Y  -  0.1667  - 

1 .8333*!  - 

3 . 1667xg 

4  1 .5000X3 

♦ 

1.5000k*  i 

0.5000* 

Source 

Of 

ss 

MS 

bi&ht 

2 

32 .444 

16.222 

bsi 

1 

9.000 

9.000 

l 

0.333 

0.333 

1*41 

1 

9.000 

9.000 

l 

0.333 

0.333 

l 

1.000 

1.000 

1*B8 

1 

0.333 

0.333 

Within 

9 

4.500 

0.500 

Total 

17 

56.944 

Figure  S  about  a  plot  of  tha  pradlctloa  aquation  and  illuatratea 
tha  intarpratatlon  of  tha  eatiaated  ragraaalon  coafficianta. 
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Note  thet  application  o £  the  proposed  blocking  procedure  enabled 
the  simultaneous  performance  of  an  analysis  of  variance  and  a  regression 
analysis.  That  is,  in  addition  to  the  usual  analysis  of  variance,  a 
prediction  equation  was  siaultaneous ly  determined. 

In  summary,  the  procedure  of  blocking  in  regression  by  using 
dumsqr  variables  provides  the  analyst  much  flexibility.  This  flexibility 
is  due  largely  to  the  analyst's  control  of  the  construction  of  the 
design  matrix.  The  elements  of  the  design  matrix  may  represent 
either  original  or  transformed  values  of  the  original  independent 
variable(s) .  Consequently,  as  illustrated  in  Section  III. 2,  different 
transf onset ions  may  be  performed  on  different  segments  of  the 
independent  varlable(s) .  In  addition,  the  advantages  afforded 
by  employing  orthogonal  polynomials  in  regression  analysis  nay 
be  realised  by  constructing  the  columns  of  the  design  matrix  to  be 
orthogonal.  Finally,  with  respect  to  the  application  to  general 
analysis  of  variance  problems,  the  author  feels  that  the  proposed 
procedure  contained  in  this  report  could  serve  as  a  basis  for  a 
computer  program  applicable  for  the  analysis  of  variance  of  both 
orthogonal  and  nonorthogonal  designs  having  quantitative  and/or 
qualitative  factors  in  crossed  and/or  nested  classifications. 
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